Detecting CD8 Tpex with ProjecTILs
In this tutorial, we will present how to use ProjecTILs to detect Tpex in public datasets that were not detecting them. This CD8 human TILs reference was built using Nick Borcherding collection of single-cell datasets from tumor patients. The map consists of 11,021 high-quality single-cell transcriptomes from 20 samples, covering 7 tumor types.
The process and code to build the map can be found in this Github repo.
Briefly, this reference was built using curated CD8+ T cells from 20 tumor-infiltrating patients, and integrated using semi-supervised STACAS. Unsupervised clustering was performed on the integrated samples. Clusters were manually annotated with respect to classical immunology markers, with a special interest in detecting Tpex. Clusters were later downsampled to have at most 2000 cells per cluster. In our experience, this optimized cluster balance improves projection, as reported here and here. Finally, this map was converted as a ProjecTILs reference. Please note that, as this reference was constructed with tumor-infiltrating samples, it might not work perfectly when mapping other tissues, such as blood or draining lymph nodes (DLN).
Why doing projection?
Projection classifies cells using a well curated
ProjecTILs reference map.
This method has the benefit of using the same cell types across projects, which is especially beneficial when analyzing huge collections of datasets.
This reference is also a way to annotate datasets only by stable cell types, and not transient cell states, like cell cycle or activation.
Results will be comparable across conditions, as there is little manual tuning. For instance, there is no need for steps that can have a high impact, such as the selection of highly variable genes.
Projection is robust to batch effects, like single-cell technologies or sequencing depth (for more information, please read ProjecTILs paper - Andreatta et al. 2021).
What happens if some cells are not covered by the reference
If some cells are not covered by the reference, they automatically get filtered-out (eg. CD4 T cells if the reference is CD8 T cells).
By default, both Run.ProjecTILs() and
ProjecTILs.classifier() have the parameter
filter.cells set as TRUE. This means that
cells out of reference will be filtered-out using the built-in scGate
model. This model is stored in the slot misc of the reference Seurat
object: ref@misc$scGate. You can custom this filtering by
amending this slot using scGate
grammar.
Human CD8 TIL reference
First, let’s have a look at the reference map.
# Load the reference
options(timeout = max(900, getOption("timeout")))
#download.file("https://figshare.com/ndownloader/files/38921366", destfile = "CD8T_human_ref_v1.rds")
ref.cd8 <- load.reference.map("CD8T_human_ref_v1.rds")## [1] "Loading Custom Reference Atlas..."
## [1] "Loaded Custom Reference map Human CD8 TILs"
# Setup colors
mycols <- ref.cd8@misc$atlas.palette
# DimPlot
DimPlot(ref.cd8, group.by = 'functional.cluster', label = T, repel = T, cols = mycols) + theme(aspect.ratio = 1)Let’s display 6 positive and 6 negative TPEX markers.
DefaultAssay(ref.cd8) <- 'RNA'
FeaturePlot(ref.cd8, features = c('TCF7','XCL1','XCL2',"CXCL13","TOX","CRTAM","GZMB","NKG7","CCL5","HAVCR2","PRF1","GZMA"), ncol = 3, pt.size = 0.5, order = T, cols = pals::coolwarm()) & NoLegend()Here are the different T cell subsets defined in the map:
CD8.NaiveLike: Antigen-naive T cells
CD8.CM: Central Memory T cells
CD8.EM: Effector Memory
CD8.TEMRA: Effector Memory cells re-expressing CD45RA. Sometimes called Short Lived Effectors (SLEC), or Cytotoxic effectors
CD8.TPEX: Progenitors exhausted T cells
CD8.TEX: Exhausted T cells
CD8.MAIT: MAIT cells, innate-like T cells defined by their semi-invariant αβ T cell receptor (TCR)
Let’s check Differentially Expressed Genes (DEGs) between each cluster, to confirm cluster marker genes.
# Compute DEGs
DefaultAssay(ref.cd8) <- "RNA"
ref.cd8 <- NormalizeData(ref.cd8)
markers <- FindAllMarkers(object = ref.cd8, only.pos = TRUE, assay = "RNA")
# Remove TCR genes
tcr.genes <- SignatuR::GetSignature(SignatuR$Hs$Compartments$TCR)
markers <- markers %>% filter(!gene %in% unname(tcr.genes))
markers %>% group_by(cluster) %>% top_n(n = 3, wt = avg_log2FC) -> top3
# Plot heatmap
VlnPlot(ref.cd8, assay = "RNA", features = top3$gene, cols = mycols, stack = T, flip = T, fill.by = "ident") + NoLegend()Progenitors exhauted (Tpex)
Pioneering work in the murine lymphocytic choriomeningitis virus (LCMV) model has mapped the molecular and phenotypic profiles of CD8+ T cells, revealing progenitors of exhausted T cells (TPEX), defined by the expression of transcription factors, TOX and TCF1. These cells arise in the acute-phase of infection and sustain terminally exhausted subsets over the long-term. As they are thought to renew the pool of terminally exhausted cells, an increasing number of reports show that this population is of primary importance for cancer immunotherapy. (Utzschneideret al., Siddiqui et al.).
Here are some marker genes to help identify them:
Positive markers: TCF7, CD200, CRTAM, GNG4, TOX, LEF1, CCR7, CXCL13, XCL1, XCL2
Negative markers: GZMB, NKG7, PRF1, HAVCR2, CCL5, GZMA
Tumor T cell differentiation model, going through an intermediate state of progenitor exhausted (Tpex). Figure from Andreatta et al. 2021.
Tpex importance is of growing interest, but except a few studies (Oliveira et al., Magen et al., Zheng et al.), this subset has been harder to detect in human. Having a human CD8 reference with clearly annotated Tpex solves this issue.
Detecting Tpex in Gueguen et al. 2021
Setup data
#download.file("https://figshare.com/ndownloader/files/39082049", destfile = "gueguen.cd3.Rds")
gueguen.cd3 <- readRDS("gueguen.cd3.Rds")
gueguen.cd3$seurat_clusters <- Idents(gueguen.cd3)Projection
Thanks to automatic scGate filtering, only the CD8 clusters (upper part of the UMAP) were mapped.
# Projection
DefaultAssay(gueguen.cd3) <- "RNA"
gueguen.cd3 <- ProjecTILs.classifier(gueguen.cd3, ref = ref.cd8, filter.cells = T, split.by = 'orig.ident', ncores = 6)
table(gueguen.cd3$functional.cluster)##
## CD8.CM CD8.EM CD8.MAIT CD8.NaiveLike CD8.TEMRA
## 2132 3181 147 238 276
## CD8.TEX CD8.TPEX
## 3340 337
DimPlot(gueguen.cd3, order = T, label = T, repel = T) DimPlot(gueguen.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)We can check specific TPEX genes to confirm that identities match.
DefaultAssay(gueguen.cd3) <- 'RNA'
FeaturePlot(gueguen.cd3, features = c('XCL1','XCL2'), ncol = 2, pt.size = 0.5, order = T, cols = pals::coolwarm()) & NoLegend()# Radar plots
p <- plot.states.radar(ref.cd8, query = gueguen.cd3, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ZNF683','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1'), return = T)
wrap_plots(p) + theme_bw()We can see that the previously homogeneous cluster CD8-LAYN seems to be composed of two subsets: CD8.TEX and CD8.TPEX.
How to assess quality/robustness of mapping
It can be hard to make the call between cells modified, and cells plainly wrongly mapped. We usually recommend assessing mapping consistency by checking consistency among top markers. If the query seems quite different from the reference, we recommend understanding DEGs between the reference and the query, for each cell type of interest.
Detecting Tpex in Yost et al. 2019
Setup data
# Load data
#download.file("https://figshare.com/ndownloader/files/39109277", destfile = "Yost.cd3.Rds")
Yost.cd3 <- readRDS("Yost.cd3.Rds")
# Normalize data
Yost.cd3 <- NormalizeData(Yost.cd3)
Yost.cd3 <- ScaleData(Yost.cd3)## Centering and scaling data matrix
# DimPlots
DimPlot(Yost.cd3, reduction = 'umap', group.by = 'cluster', label = T)DimPlot(Yost.cd3, reduction = 'umap', group.by = 'patient', label = T, repel = T)Projection
As this dataset is a mix between CD4 and CD8 T cells, we will keep
the parameter filter.cells as TRUE to keep only CD8+ T
cells.
DefaultAssay(Yost.cd3) <- "RNA"
Yost.cd3 <- ProjecTILs.classifier(Yost.cd3, ref = ref.cd8, filter.cells = T, split.by = 'patient', ncores = 6)
table(Yost.cd3$functional.cluster)##
## CD8.CM CD8.EM CD8.MAIT CD8.NaiveLike CD8.TEMRA
## 3806 5296 316 438 965
## CD8.TEX CD8.TPEX
## 2370 499
DimPlot(Yost.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)We indeed detect TPEX. Let’s check globally how the expression profiles look.
# Radar plots
p <- plot.states.radar(ref.cd8, query = Yost.cd3, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ZNF683','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1'), return = T)
wrap_plots(p) + theme_bw()We can see that in Yost et al. dataset, CD8.TPEX can be found with profiles matching the reference. TPEX are found scattered all over the original UMAP space. This is because activation signals and other confounding factors were contributing to defining UMAP space. Reference-based annotation uncovers cell types behind the activation program. If you are interested in recovering cell types hidden by transient cell states, you can read more in the corresponding tutorial.
Session Info
sessionInfo()## R version 4.2.1 (2022-06-23)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices datasets utils methods base
##
## other attached packages:
## [1] plotly_4.10.1 EnhancedVolcano_1.14.0 pheatmap_1.0.12
## [4] UCell_2.2.0 scales_1.2.1 ggrepel_0.9.2
## [7] SignatuR_0.1.0 gridExtra_2.3 ProjecTILs_3.0.2
## [10] patchwork_1.1.2 GEOquery_2.66.0 Biobase_2.58.0
## [13] BiocGenerics_0.44.0 data.table_1.14.6 STACAS_2.0.0
## [16] scGate_1.4.1 forcats_0.5.2 stringr_1.5.0
## [19] dplyr_1.0.10 purrr_1.0.1 readr_2.1.3
## [22] tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0
## [25] tidyverse_1.3.2 SeuratObject_4.1.3 Seurat_4.3.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 spatstat.explore_3.0-5
## [3] reticulate_1.27 tidyselect_1.2.0
## [5] htmlwidgets_1.6.1 grid_4.2.1
## [7] BiocParallel_1.32.5 Rtsne_0.16
## [9] munsell_0.5.0 codetools_0.2-18
## [11] ica_1.0-3 umap_0.2.9.0
## [13] future_1.30.0 miniUI_0.1.1.1
## [15] withr_2.5.0 spatstat.random_3.1-2
## [17] colorspace_2.1-0 progressr_0.13.0
## [19] highr_0.10 knitr_1.41
## [21] rstudioapi_0.14 stats4_4.2.1
## [23] SingleCellExperiment_1.20.0 ROCR_1.0-11
## [25] tensor_1.5 listenv_0.9.0
## [27] labeling_0.4.2 MatrixGenerics_1.10.0
## [29] GenomeInfoDbData_1.2.9 polyclip_1.10-4
## [31] farver_2.1.1 parallelly_1.34.0
## [33] vctrs_0.5.2 generics_0.1.3
## [35] xfun_0.36 timechange_0.2.0
## [37] R6_2.5.1 GenomeInfoDb_1.34.7
## [39] rmdformats_1.0.4 pals_1.7
## [41] bitops_1.0-7 spatstat.utils_3.0-1
## [43] cachem_1.0.6 DelayedArray_0.24.0
## [45] assertthat_0.2.1 promises_1.2.0.1
## [47] googlesheets4_1.0.1 gtable_0.3.1
## [49] globals_0.16.2 goftest_1.2-3
## [51] rlang_1.0.6 splines_4.2.1
## [53] lazyeval_0.2.2 gargle_1.2.1
## [55] dichromat_2.0-0.1 spatstat.geom_3.0-5
## [57] broom_1.0.2 BiocManager_1.30.19
## [59] yaml_2.3.7 reshape2_1.4.4
## [61] abind_1.4-5 modelr_0.1.10
## [63] backports_1.4.1 httpuv_1.6.8
## [65] tools_4.2.1 bookdown_0.32
## [67] ellipsis_0.3.2 jquerylib_0.1.4
## [69] RColorBrewer_1.1-3 ggridges_0.5.4
## [71] Rcpp_1.0.10 plyr_1.8.8
## [73] zlibbioc_1.44.0 RCurl_1.98-1.9
## [75] openssl_2.0.5 deldir_1.0-6
## [77] pbapply_1.7-0 cowplot_1.1.1
## [79] S4Vectors_0.36.1 zoo_1.8-11
## [81] SummarizedExperiment_1.28.0 haven_2.5.1
## [83] cluster_2.1.4 fs_1.6.0
## [85] magrittr_2.0.3 RSpectra_0.16-1
## [87] scattermore_0.8 lmtest_0.9-40
## [89] reprex_2.0.2 RANN_2.6.1
## [91] googledrive_2.0.0 fitdistrplus_1.1-8
## [93] matrixStats_0.63.0 hms_1.1.2
## [95] mime_0.12 evaluate_0.20
## [97] xtable_1.8-4 readxl_1.4.1
## [99] IRanges_2.32.0 compiler_4.2.1
## [101] maps_3.4.1 KernSmooth_2.23-20
## [103] crayon_1.5.2 htmltools_0.5.4
## [105] later_1.3.0 tzdb_0.3.0
## [107] lubridate_1.9.0 DBI_1.1.3
## [109] dbplyr_2.3.0 MASS_7.3-58.2
## [111] data.tree_1.0.0 Matrix_1.5-3
## [113] cli_3.6.0 parallel_4.2.1
## [115] igraph_1.3.5 GenomicRanges_1.50.2
## [117] pkgconfig_2.0.3 sp_1.6-0
## [119] spatstat.sparse_3.0-0 xml2_1.3.3
## [121] bslib_0.4.2 XVector_0.38.0
## [123] rvest_1.0.3 digest_0.6.31
## [125] pracma_2.4.2 sctransform_0.3.5
## [127] RcppAnnoy_0.0.20 spatstat.data_3.0-0
## [129] rmarkdown_2.20 cellranger_1.1.0
## [131] leiden_0.4.3 uwot_0.1.14
## [133] shiny_1.7.4 lifecycle_1.0.3
## [135] nlme_3.1-161 jsonlite_1.8.4
## [137] BiocNeighbors_1.16.0 mapproj_1.2.9
## [139] askpass_1.1 viridisLite_0.4.1
## [141] limma_3.54.0 fansi_1.0.4
## [143] pillar_1.8.1 lattice_0.20-45
## [145] fastmap_1.1.0 httr_1.4.4
## [147] survival_3.5-0 glue_1.6.2
## [149] png_0.1-8 stringi_1.7.12
## [151] sass_0.4.5 renv_0.15.5
## [153] irlba_2.3.5.1 future.apply_1.10.0